By the Numbers: the Rationale for Rasch Analysis in Placement Testing
نویسنده
چکیده
Placement tests are usually designed to assess relative language ability within the range of a particular program. Test scores are generally interpreted as measures of language ability, and students are compared and placed in accordance to them. This paper argues that an application of the Rasch model to placement situations is not only warranted by the assumptions of the placement process, but also that great benefits can be achieved by examining items and persons that do not fit the Rasch model. To illustrate these points, the University of Hawai‘i English Language Institute Academic Listening Test is analyzed and discussed. This paper uses a Rasch analysis perspective to examine the Academic Listening Test (ALT) used by the English Language Institute (ELI) at the University of Hawaii at Manoa for placement into academic listening courses. This is not a validation study of a new test but rather a reevaluation, from a Rasch measurement perspective, of a test that has been used for almost a decade. Although this is certainly not the first application of a latent trait approach to placement test analysis (e.g., Blais & Laurier, 1995; KondoBrown & Brown, 2000; Sasaki, 1991), the diagnostic information available with the Rasch approach is rarely exploited. This reevaluation is the first step in revising and updating the ALT. Placement Testing and Test Scores Placement testing in language programs is primarily concerned with assessing students’ language proficiency for the purpose creating relatively homogeneous groups CLARK – BY THE NUMBERS: THE RATIONALE FOR RASCH ANALYSIS IN PLACEMENT 62 for instructional purposes (Bachman & Palmer, 1996; Brown, 1996). For a language school, the placement test may cover a wide range of ability and subsequent placements may range from beginning to advanced instructional classes. In other cases, such as support language programs for international students studying in the US, this placement decision may also include the determination that the student in question does not need further language instruction because he or she has exceeded the level of instruction provided by the service program. This is often the case in the university setting in which students for whom the language of instruction is not their native language must demonstrate a minimum level of language for conditional admittance into the university and a second, higher level of proficiency to take a full course load. In this type of situation, the placement decisions are usually not over the whole range of language proficiency, but rather within a relatively narrow band of language ability, specifically between the admittance level and the exemption level (Brown, 1989). Placement test scores are interpreted as measures of language ability, that is, a higher score on the test indicates a higher level of language ability and thus warrants placement in a more advanced language course. Of course, the actual designations of courses as being at a certain ability level (i.e., Intermediate Listening) tend to be arbitrary and program-specific. Given the same number of students and range of language abilities, one program may have the resources to offer small classes representing fine distinctions in ability whereas another program may merely divide the group into beginning and advanced classes. Regardless of actual placement procedures employed, the assumption still remains that the placement test is distributing students along a continuum of language abilities from which instructional groupings can be created. In fact, perhaps it is more appropriate to say that the logic of placement testing as it is usually carried out requires that the placement instruments distribute students along a continuum of language abilities in the domain of interest. 1 An exception to this general rule would be programs which tie their courses to common proficiency rating scales (e.g., ACTFL, ASLPR) but even here differences in program resources can lead to classes composed of students from wider or narrower chunks of the scale. CLARK – BY THE NUMBERS: THE RATIONALE FOR RASCH ANALYSIS IN PLACEMENT 63 The Current Study This paper starts from the premise hinted above, namely that the logic of placement testing, at least as it is carried out by this program and probably many other similar programs, makes the implicit assumption that the total placement test score of a student is a sufficient indicator of language abilities and thus can be directly compared across students for the purposes of placement. Therefore, if the ALT is to be useful as a measurement instrument, it should have the following characteristics: (a) a higher score on the test represents a greater level of listening ability, (b) the items are targeted to the population that the test is designed for, and (c) the items do not function differentially for subgroups of examinees. This paper will start with a description of the ALT, then outline the salient points of the Rasch model and the rationale for analyzing the test from this perspective. Next, data used in this study will be described and the results of the analysis will be presented with reference to the necessary characteristics cited above. The final section summarizes the points made in the paper. BACKGROUND AND RATIONALE The Academic Listening Test (ALT) The ALT is one of a battery of tests used to determine if newly admitted students for whom English is not a first language have sufficient English ability to take a full load of credit bearing courses with no additional language support. This test is only administered to students who have not provided evidence of sufficient language proficiency (a particular number of transferable credits from English-medium institution, scores above set criteria on standardized tests, etc.) at the time of registration for classes. Thus the range of language ability to be tested is rather narrow as students with very low ability would have been denied admission to the University outright and students with high ability have already been exempted from additional language study. Three placement decisions are possible based on the ALT score: (a) additional study at the intermediate level, (b) additional study at the advanced level, or (c) exemption from further study. CLARK – BY THE NUMBERS: THE RATIONALE FOR RASCH ANALYSIS IN PLACEMENT 64 The ALT consists of four sections, the breakdown of which is shown in Table 1. All of the items use the multiple-choice format. This format was chosen primarily for practical reasons as the results of the test must be made available as soon as possible so students can continue with the registration process and the multiple-choice format allows for the machine scoring of tests. Table 1 Overview of the Academic Listening Test (ALT) Section Task Topic Item numbers Volcanic origins of Hawaiian islands 1 – 4 Development of motion pictures 5 – 10 MLV Train 11 – 14 Section One Listen to a short passage and answer questions pertaining to the content Missing library book 15 – 20 Section Two Determine the meaning of a word after hearing it used in a sentence Vocabulary 21 – 24 Section Three Listen to two sentences and determine the best word or phrase to connect them Transitions 25 – 29 Section Four Listen and take notes for a ten minute lecture then answer questions pertaining to that lecture Lecture – Culture and language 30 – 40 The conception of language ability underlying the ALT is essentially one of communicative competence. Though there have been revisions and reformulations in the literature (e.g., Bachman, 1990; Bachman & Palmer, 1982; Canale, 1983; Canale & Swain, 1980), the essential idea is that the ability to communicate in a language entails not only the ability to manipulate the formal structure of the language properly (organizational competence), but also the ability to produce discourse that is appropriate for situation and context (pragmatic competence). These competencies are in turn composed of sub-competencies at finer levels of scale such that, for example, organizational competence entails grammatical competence (the formulation of grammatically appropriate sentences) and discourse competence (the arrangement of a series of grammatical sentences into a larger chunk of discourse, such as a lecture). It is CLARK – BY THE NUMBERS: THE RATIONALE FOR RASCH ANALYSIS IN PLACEMENT 65 assumed that gradual increases in the various sub-competencies eventually manifest themselves as an increase in overall competence. This is true of comprehension as well as production. As one becomes more proficient, all other things being equal, one is better able to handle more demanding tasks. To give an example, the ability to distinguish between the language one is studying and another language might be considered an easy task whereas the ability to listen to a lecture on an unfamiliar topic and recount the main points would be a task that requires considerably more listening ability (Nunan, 1989). Of course, in the case of listening, the characteristics of the stimulus itself can impose greater or lesser demands on the listener even though the listening task is similar. In other words, a lecture composed of easy lexis delivered clearly with prosodic emphasis on the main points would be considerably easier than a lecture of identical length composed of uncommon words delivered in a mumbling monotone voice at breakneck speed. In terms of word choice and discourse style, the language on the test is academic in nature. Sections One and Four represent the essential listening task of attending to a spoken message for content. The topics were chosen for their interest and generality. The language in these passages reflects what might be considered general academic language such as would be found in an introductory course. It is assumed that someone with greater listening ability would have more success at comprehending the passages. In all cases, the passages were recorded using a script, so they are artificial in the sense that they do not contain the false-starts and self-corrections that might be produced by someone speaking extemporaneously. The exception to this is the lecture in Section Four which, though scripted, was intentionally recorded in a more relaxed manner and is more akin to someone lecturing from notes than reading a prepared manuscript, complete with false starts, hesitations, and fillers. Sections Two and Three reflect an interest in the subcomponents of language ability discussed above, namely the ability to infer unknown words from context (Section Two) and the ability to recognize appropriate discourse structuring devices (Section Three). It has been noted that confusion can arise when dealing with components which are hierarchical in nature if one is not cognizant of the appropriate level of scale that should be considered for the measurement purpose at hand (Andrich, 2002a, 2002b). A substantive question for the analysis of the ALT is whether these four sections represent the same level of scale or not. CLARK – BY THE NUMBERS: THE RATIONALE FOR RASCH ANALYSIS IN PLACEMENT 66 The Case for the Rasch Model Dunkel, Henning, and Chaudron have argued that “unless some implicational or Guttman-type scale can be formed with monotonic increment of person ability and task difficulty in the same response matrix, whatever we choose to label listening comprehension would not qualify as a unitary measurement construct, and the reporting of unitary scores as a reflection of comparative performance would be misleading” (1993, p. 182). The Rasch model is ideally suited to this task for two reasons. First, both items and persons are on the same metric and, second, the total score is a sufficient statistic (Linacre, 1992; van der Linden, 1992). This means that, provided the data fit the model, the total score contains all of the information about an examinee’s ability and thus, “the classification of persons according to their total scores is justified” (Andrich, 1988b, p. 38). In the Rasch model, the probability of a person succeeding on a given item is dependent upon the ability of the person and the difficulty of the item. The more able a person is in relation to a given item, the greater probability there is of that person being successful on that item. Unlike the Guttman model, the Rasch model is probabilistic rather than deterministic and recognizes that the same total score can be arrived at by different combinations of items, with the Guttman structure being the most probable pattern (Andrich, 1985). The model has also been described as axiomatic (Bond & Fox, 2001; Wright, 1997) in the sense that, as a mathematical model, it requires data which represent a unitary construct in accordance with a particular theory of that construct (Andrich, 1989). This does not mean that the construct cannot have several psychological dimensions or components. Borrowing an argument and analogy from Thurstone (1928), it is impossible to represent the entire complexity of an object as a single value; even something as concrete as a table (p. 215). There is always a certain loss of information in any measurement. That is, it is impossible to measure a table without specifying what aspect of the table (weight, height, etc.) will be measured. Taking the analogy further, even though most people would agree that it is perfectly acceptable to talk about the weight of tables and make comparisons between them on that basis, this does not imply that a table’s weight is readily and consistently determinable from its constituent parts or CLARK – BY THE NUMBERS: THE RATIONALE FOR RASCH ANALYSIS IN PLACEMENT 67 properties. Certainly, tables of the same weight can differ in terms of color, size, material used in construction, style, degree of wear, number of missing parts, etc., with many of those factors contributing directly to the overall weight. Nevertheless, even though there are many factors which contribute to the weight of the table, it is not required that they all be specified or even present in consistent proportions for a useful measurement of weight
منابع مشابه
Psychometric properties of Geriatric Depression Scale (GDS) among elderlies in Tehran using multidimensional Rasch model
Introduction and purpose: The purpose of this study was to examine the psychometric properties of the Geriatric Depression Scale (GDS) when applied to the elderly of Tehran. This research is applied-developmental, descriptive and quantitative. Materials and Methods: The research population was Tehrani elderlies, among which 400 people responded to the Geriatric Depression Scale voluntarily and...
متن کاملThe Impact of Raters’ and Test Takers’ Gender on Oral Proficiency Assessment: A Case of Multifaceted Rasch Analysis
The application of Multifaceted Rasch Measurement (MFRM) in rating test takers’ oral language proficiency has been investigated in some previous studies (e.g., Winke, Gass, & Myford, 2012). However, little research so far has ever documented the effect of test takers’ genders on their oral performances and few studies have investigated the relationship between the impact of raters’ gender on th...
متن کاملImplicational Scaling of Reading Comprehension Construct: Is it Deterministic or Probabilistic?
In English as a Second Language Teaching and Testing situations, it is common to infer about learners’ reading ability based on his or her total score on a reading test. This assumes the unidimensional and reproducible nature of reading items. However, few researches have been conducted to probe the issue through psychometric analyses. In the present study, the IELTS exemplar module C (1994) wa...
متن کاملInvestigating the Effect of Self-, Peer-, and Teacher Assessment in Second Language Writing over Time: A Multifaceted Rasch Approach
This study investigated the accuracy of scores assigned by self-, peer-, and teacher assessors over time. Thirty-three English majors who were taking paragraph development course at Vali-e-Asr University of Rafsanjan and two instructors who had been teaching essay writing for at least two years at university, participated in the study. After receiving instructions on paragraph development, part...
متن کاملApplication of Rasch model for evaluating the quality of life in blind war veterans
Background: Quality of life evaluates the general well-being of individuals and it can be considered as one of the important aspects in programming and giving service to disabled people. Blindness is one of the most important kinds of physical disability that has a direct effect on quality of life, so t his study aimed to explore how war blindness influences the quality of life . Methods: I...
متن کامل